Fixing Boundary Violations

2.2 Type I Boundary Constraints

Type I constraints state that the dependent variable should be bounded within the lower limit $a$ and upper limit $b$ . Considering the joint set of the maximum and minimum values for the independent variable ${{x}_{j}}$ ( $j$ is the variable indicator), the boundary constraints for $\hat{y}$ should be

$\begin{equation*} a\le \hat{y}^{\min }\le \hat{y}^{\max }\le b, \end{equation*}$
where

$\begin{align*} &\hat{y}^{\max}=\sum\limits_{j=0}^{m}{\left( v_{j}^{+}{{\beta }_{j}}x_{j}^{\max }+v_{j}^{-} {{\beta }_{j}}x_{j}^{\min } \right)}\\ &\hat{y}^{\min}=\sum\limits_{j=0}^{m}{\left( v_{j}^{+}{{\beta }_{j}}x_{j}^{\min }+v_{j}^{-} {{\beta }_{j}}x_{j}^{\max } \right)}, \end{align*}$
and

$v_{j}^{+}$ and

$v_{j}^{-}$ are indicator variables indicating

$\begin{equation*} v_{j}^{+}= \begin{cases} 1 \qquad if \quad{{\beta }_{j}}> 0 \\ 0 \qquad otherwise \\ \end{cases} , \quad v_{j}^{-}= \begin{cases} 1 \qquad if \quad{{\beta }_{j}} < 0 \\ 0 \qquad otherwise \\ \end{cases}. \end{equation*}$

It is not uncommon for the truncated regression model to give a solution that violates boundary constraints of the dependent variable. In such a case, there is no way to give a meaningful interpretation, since the data-generating process forbids its occurrence. For example, it makes no sense to predict a party will get $105\%$ or $-5\%$ of votes given that the vote share is bounded within $100\%$ and $0\%$ . Here we use the joint set of maximum or minimum covariate values because sometimes the predicted values of all empirical observations are admissible, but certain combinations of the covariate values can result in boundary violations. For example, all empirical pairs of ${{x}_{ji}}$ do not generate out-of-bounds prediction ${{\hat{y}}_{i}}$ , but some combinations of $x_{j}^{\max }$ and $x_{j}^{\min }$ might give an inadmissible prediction larger than $b$ or smaller than $a$ .⁵ Unless we have a reason to rule out the possibility of their joint presence, we should evaluate type I boundary violations by including all possible combinations of the covariate values.⁶

____________________

Footnote

⁵ Imagine our model predicts that IQ score and study hour are both positively related to SAT score, but the data does not have a case in which both variables have the maximum value. Such a case is very likely to exist, and our model should not generate an out-of-bounds predicted value on SAT score.

⁶ When a set of covariates is composed of regional dummy variables, the joint presence is impossible, and therefore, only the maximum and minimum coefficients of those dummies are specified in the boundary constraints of $y_{i}$ .